智能论文笔记

Noise2Contrast: Multi-Contrast Fusion Enables Self-Supervised Tomographic Image Denoising

Fabian Wagner , Mareike Thies , Laura Pfaff , Noah Maul , Sabrina Pechmann , Mingxuan Gu , Jonas Utz , Oliver Aust , Daniela Weidner , Georgiana Neag

分类：计算机视觉

2022-12-09

Self-supervised image denoising techniques emerged as convenient methods that allow training denoising models without requiring ground-truth noise-free data. Existing methods usually optimize loss metrics that are calculated from multiple noisy realizations of similar images, e.g., from neighboring tomographic slices. However, those approaches fail to utilize the multiple contrasts that are routinely acquired in medical imaging modalities like MRI or dual-energy CT. In this work, we propose the new self-supervised training scheme Noise2Contrast that combines information from multiple measured image contrasts to train a denoising model. We stack denoising with domain-transfer operators to utilize the independent noise realizations of different image contrasts to derive a self-supervised loss. The trained denoising operator achieves convincing quantitative and qualitative results, outperforming state-of-the-art self-supervised methods by 4.7-11.0%/4.8-7.3% (PSNR/SSIM) on brain MRI data and by 43.6-50.5%/57.1-77.1% (PSNR/SSIM) on dual-energy CT X-ray microscopy data with respect to the noisy baseline. Our experiments on different real measured data sets indicate that Noise2Contrast training generalizes to other multi-contrast imaging modalities.

translated by 谷歌翻译

Gradient-Based Geometry Learning for Fan-Beam CT Reconstruction

Mareike Thies , Fabian Wagner , Noah Maul , Lukas Folle , Manuela Meier , Maximilian Rohleder , Linda-Sophie Schneider , Laura Pfaff , Mingxuan Gu , Jonas Utz

分类：计算机视觉

2022-12-05

Incorporating computed tomography (CT) reconstruction operators into differentiable pipelines has proven beneficial in many applications. Such approaches usually focus on the projection data and keep the acquisition geometry fixed. However, precise knowledge of the acquisition geometry is essential for high quality reconstruction results. In this paper, the differentiable formulation of fan-beam CT reconstruction is extended to the acquisition geometry. This allows to propagate gradient information from a loss function on the reconstructed image into the geometry parameters. As a proof-of-concept experiment, this idea is applied to rigid motion compensation. The cost function is parameterized by a trained neural network which regresses an image quality metric from the motion affected reconstruction alone. Using the proposed method, we are the first to optimize such an autofocus-inspired algorithm based on analytical gradients. The algorithm achieves a reduction in MSE by 35.5 % and an improvement in SSIM by 12.6 % over the motion affected reconstruction. Next to motion compensation, we see further use cases of our differentiable method for scanner calibration or hybrid techniques employing deep models.

translated by 谷歌翻译

Trainable Joint Bilateral Filters for Enhanced Prediction Stability in Low-dose CT

Fabian Wagner , Mareike Thies , Felix Denzinger , Mingxuan Gu , Mayank Patwari , Stefan Ploner , Noah Maul , Laura Pfaff , Yixing Huang , Andreas Maier

分类：计算机视觉

2022-07-15

低剂量计算机断层扫描（CT）降级算法旨在使常规CT采集中的患者剂量减少，同时保持高图像质量。最近，引入了深度学习〜（DL）的方法，由于其高模型容量，因此在此任务上的常规降级算法优于常规deno。但是，为了过渡基于DL的denoing到临床实践，这些数据驱动的方法必须超越可见的训练数据来概括地概括。因此，我们提出了一种由一组可训练的联合双边滤波器（JBF）组成的混合脱糖性方法，并结合了基于卷积DL的deNoising网络，以预测指导图像。我们提出的denoising管道结合了通过基于DL的功能提取和常规JBF的可靠性启用的高模型容量。通过在没有金属植入物的腹部CT扫描上进行训练以及对金属植入物以及头部CT数据进行腹部扫描测试，可以证明该管道的概括能力。当我们的管道中嵌入两个基于DL的DENOISER（RED-CNN/QAE）时，Denoisis的性能提高了$ 10 \，\％$/$ 82 \，\％$（RMSE）和$ 3 \，\％$ /$ 81 \，\％$（psnr）在包含金属的区域和$ 6 \，\％$/$ 78 \，\％$（rmse）和$ 2 \，\％$/$ 4 \，\％$（psnr）上与各自的香草模型相比，头部CT数据。最后，提出的可训练的JBFS限制了深神经网络的误差结合，以促进基于DL的DeOisers在低剂量CT管道中的适用性。

translated by 谷歌翻译

RFID-Cloud Integration for Smart Management of Public Car Parking Spaces

Umar Yahya , Ndawula Noah , Asingwire Hanifah , Lubega Faham , Abdal Kasule , Hamisi Ramadhan Mubarak

分类：人工智能 | 机器人

2022-12-25

Effective management of public shared spaces such as car parking space, is one challenging transformational aspect for many cities, especially in the developing World. By leveraging sensing technologies, cloud computing, and Artificial Intelligence, Cities are increasingly being managed smartly. Smart Cities not only bring convenience to City dwellers, but also improve their quality of life as advocated for by United Nations in the 2030 Sustainable Development Goal on Sustainable Cities and Communities. Through integration of Internet of Things and Cloud Computing, this paper presents a successful proof-of-concept implementation of a framework for managing public car parking spaces. Reservation of parking slots is done through a cloud-hosted application, while access to and out of the parking slot is enabled through Radio Frequency Identification (RFID) technology which in real-time, accordingly triggers update of the parking slot availability in the cloud-hosted database. This framework could bring considerable convenience to City dwellers since motorists only have to drive to a parking space when sure of a vacant parking slot, an important stride towards realization of sustainable smart cities and communities.

translated by 谷歌翻译

Parsel: A Unified Natural Language Framework for Algorithmic Reasoning

Eric Zelikman , Qian Huang , Gabriel Poesia , Noah D. Goodman , Nick Haber

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-20

Despite recent success in large language model (LLM) reasoning, LLMs still struggle with hierarchical multi-step reasoning like generating complex programs. In these cases, humans often start with a high-level algorithmic design and implement each part gradually. We introduce Parsel, a framework enabling automatic implementation and validation of complex algorithms with code LLMs, based on hierarchical function descriptions in natural language. Parsel can be used across domains requiring hierarchical reasoning, e.g. code synthesis, theorem proving, and robotic planning. We demonstrate Parsel's capabilities by using it to generate complex programs that cannot currently be automatically implemented from one description and backtranslating Python programs in the APPS dataset. Beyond modeling capabilities, Parsel allows problem-solving with high-level algorithmic designs, benefiting both students and professional programmers.

translated by 谷歌翻译

Character-Aware Models Improve Visual Text Rendering

Rosanne Liu , Dan Garrette , Chitwan Saharia , William Chan , Adam Roberts , Sharan Narang , Irina Blok , RJ Mical , Mohammad Norouzi , Noah Constant

分类：自然语言处理 | 计算机视觉

2022-12-20

Current image generation models struggle to reliably produce well-formed visual text. In this paper, we investigate a key contributing factor: popular text-to-image models lack character-level input features, making it much harder to predict a word's visual makeup as a series of glyphs. To quantify the extent of this effect, we conduct a series of controlled experiments comparing character-aware vs. character-blind text encoders. In the text-only domain, we find that character-aware models provide large gains on a novel spelling task (WikiSpell). Transferring these learnings onto the visual domain, we train a suite of image generation models, and show that character-aware variants outperform their character-blind counterparts across a range of novel text rendering tasks (our DrawText benchmark). Our models set a much higher state-of-the-art on visual spelling, with 30+ point accuracy gains over competitors on rare words, despite training on far fewer examples.

translated by 谷歌翻译

Self-Instruct: Aligning Language Model with Self Generated Instructions

Yizhong Wang , Yeganeh Kordi , Swaroop Mishra , Alisa Liu , Noah A. Smith , Daniel Khashabi , Hannaneh Hajishirzi

分类：自然语言处理 | 人工智能

2022-12-20

Large "instruction-tuned" language models (finetuned to respond to instructions) have demonstrated a remarkable ability to generalize zero-shot to new tasks. Nevertheless, they depend heavily on human-written instruction data that is limited in quantity, diversity, and creativity, therefore hindering the generality of the tuned model. We introduce Self-Instruct, a framework for improving the instruction-following capabilities of pretrained language models by bootstrapping off its own generations. Our pipeline generates instruction, input, and output samples from a language model, then prunes them before using them to finetune the original model. Applying our method to vanilla GPT3, we demonstrate a 33% absolute improvement over the original model on Super-NaturalInstructions, on par with the performance of InstructGPT_001, which is trained with private user data and human annotations. For further evaluation, we curate a set of expert-written instructions for novel tasks, and show through human evaluation that tuning GPT3 with Self-Instruct outperforms using existing public instruction datasets by a large margin, leaving only a 5% absolute gap behind InstructGPT_001. Self-Instruct provides an almost annotation-free method for aligning pre-trained language models with instructions, and we release our large synthetic dataset to facilitate future studies on instruction tuning.

translated by 谷歌翻译

Task Ambiguity in Humans and Language Models

Alex Tamkin , Kunal Handa , Avash Shrestha , Noah Goodman

分类：自然语言处理 | 机器学习

2022-12-20

Language models have recently achieved strong performance across a wide range of NLP benchmarks. However, unlike benchmarks, real world tasks are often poorly specified, and agents must deduce the user's intended behavior from a combination of context, instructions, and examples. We investigate how both humans and models behave in the face of such task ambiguity by proposing AmbiBench, a new benchmark of six ambiguously-specified classification tasks. We evaluate humans and models on AmbiBench by seeing how well they identify the intended task using 1) instructions with varying degrees of ambiguity, and 2) different numbers of labeled examples. We find that the combination of model scaling (to 175B parameters) and training with human feedback data enables models to approach or exceed the accuracy of human participants across tasks, but that either one alone is not sufficient. In addition, we show how to dramatically improve the accuracy of language models trained without large-scale human feedback training by finetuning on a small number of ambiguous in-context examples, providing a promising direction for teaching models to generalize well in the face of ambiguity.

translated by 谷歌翻译

One Embedder, Any Task: Instruction-Finetuned Text Embeddings

Hongjin Su , Weijia Shi* , Jungo Kasai , Yizhong Wang , Yushi Hu , Mari Ostendorf , Wen-tau Yih , Noah A. Smith , Luke Zettlemoyer , Tao Yu

分类：自然语言处理

2022-12-19

We introduce INSTRUCTOR, a new method for computing text embeddings given task instructions: every text input is embedded together with instructions explaining the use case (e.g., task and domain descriptions). Unlike encoders from prior work that are more specialized, INSTRUCTOR is a single embedder that can generate text embeddings tailored to different downstream tasks and domains, without any further training. We first annotate instructions for 330 diverse tasks and train INSTRUCTOR on this multitask mixture with a contrastive loss. We evaluate INSTRUCTOR on 70 embedding evaluation tasks (66 of which are unseen during training), ranging from classification and information retrieval to semantic textual similarity and text generation evaluation. INSTRUCTOR, while having an order of magnitude fewer parameters than the previous best model, achieves state-of-the-art performance, with an average improvement of 3.4% compared to the previous best results on the 70 diverse datasets. Our analysis suggests that INSTRUCTOR is robust to changes in instructions, and that instruction finetuning mitigates the challenge of training a single model on diverse datasets.

translated by 谷歌翻译

Feature Dropout: Revisiting the Role of Augmentations in Contrastive Learning

Alex Tamkin , Margalit Glasgow , Xiluo He , Noah Goodman

分类：机器学习 | 计算机视觉

2022-12-16

What role do augmentations play in contrastive learning? Recent work suggests that good augmentations are label-preserving with respect to a specific downstream task. We complicate this picture by showing that label-destroying augmentations can be useful in the foundation model setting, where the goal is to learn diverse, general-purpose representations for multiple downstream tasks. We perform contrastive learning experiments on a range of image and audio datasets with multiple downstream tasks (e.g. for digits superimposed on photographs, predicting the class of one vs. the other). We find that Viewmaker Networks, a recently proposed model for learning augmentations for contrastive learning, produce label-destroying augmentations that stochastically destroy features needed for different downstream tasks. These augmentations are interpretable (e.g. altering shapes, digits, or letters added to images) and surprisingly often result in better performance compared to expert-designed augmentations, despite not preserving label information. To support our empirical results, we theoretically analyze a simple contrastive learning setting with a linear model. In this setting, label-destroying augmentations are crucial for preventing one set of features from suppressing the learning of features useful for another downstream task. Our results highlight the need for analyzing the interaction between multiple downstream tasks when trying to explain the success of foundation models.

translated by 谷歌翻译